02 February 2019
The Collatz Conjecture is a problem posed by L. Collatz in 1937. Take a positive integer. If it is even, divide it by two. If it is odd, multiply by 3 and add 1. The conjecture states that if you iterate this process for any positive integer, you will always reach 1. So if we take the function:
f(x) = \begin{cases} \frac{x}{2} \qquad \text{for } x \text{ even} \\ 3x+1 \quad \text{for } x \text{ odd} \end{cases}
and iterate it for any positive integer n, we will always reach 1.
For example for 7, this looks like:
7 \rightarrow 22 \rightarrow 11 \rightarrow 34 \rightarrow 17 \rightarrow 52 \rightarrow 26 \rightarrow 13 \rightarrow 40 \rightarrow 20 \rightarrow 10 \rightarrow 5 \rightarrow 16 \rightarrow 8 \rightarrow 4 \rightarrow 2 \rightarrow 1
I will be using NumPy to build the
arrays. NumPy has optimized calculation routines for its fundamental
array data structure, which makes it great for scientific computing. It
can be installed via pip install numpy
.
matplotlib is an awesome 2D
plotting library which we will use to visualize the data. It can
similarly be installed via pip install matplotlib
.
We can write two python functions to analyze this conjecture. The
first function, collatzer()
will take the positive integer
and return the number of steps it took to reach 1. The second function,
collatzer_steps()
, will take the positive integer and
return the values of the iteration.
import numpy as np
def collatzer(i):
= 0
count while i > 1:
if i % 2 == 0:
= i // 2
i else:
= (3 * i) + 1
i = count + 1
count return count
def collatzer_steps(i, dtype=int):
= int(i)
i = [i]
steps while i > 1:
if i % 2 == 0:
= i // 2
i
steps.append(i)else:
= (3 * i) + 1
i
steps.append(i)if dtype == int:
return np.array(steps, dtype=np.int64)
else:
return np.array(steps, dtype=float)
Note the usage of numpy
arrays since they tend to be
more efficient. Also the collatzer_steps()
function uses
ìnt64
type by default but can also use float
.
This is because int64
overflows at very large numbers. The
outputs are as follows:
>>> from Collatz import *
>>> collatzer(7)
16
>>> collatzer_steps(7)
7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]) array([
Now we are ready to analyse the conjecture. Let us look at the first 1000 numbers and see how many steps each take to reach 1.
import matplotlib.pyplot as plt
import numpy as np
from Collatz import *
from timeit import default_timer as timer
= timer()
start = 1000
limit = np.zeros(limit, dtype=int)
y = np.arange(limit, dtype=int)
x for n in x:
= collatzer(x[n])
y[n] = plt.subplots()
fig, ax '.', ms=4)
ax.plot(x, y, = timer()
end print(end-start)
plt.show()
By using the timeit
module, we can see that the whole
calculation took about 140ms. Here is the plot:
It looks like two streams coming from 0 and somehwere between 100-120 to meet with another. We can also see peculiarly straight horizontal lines.
For perspective, let’s plot the first million numbers, along with the
function log_{2} x. This function is
particularly interesting because for any power of 2, log_{2} x
is the number of the steps it takes to reach 1. In other words, the number 2^n reaches 1 in n
steps. Modifying the code a little bit, and using
np.savetxt()
to dump the data for future use, we observe
that the operation took 147 seconds for the first million numbers.
import matplotlib.pyplot as plt
import numpy as np
from collatz import *
from timeit import default_timer as timer
= timer()
start = 1000000
limit = np.zeros(limit, dtype=int)
col_y = np.zeros(limit)
log_y = np.arange(limit, dtype=int)
x for n in x:
= collatzer(x[n])
col_y[n] = np.log2(x[n])
log_y[n] 0] = 0 # because log2(0) gives -inf
log_y[= plt.subplots()
fig, ax '.', ms=2)
ax.plot(x, col_y, 'r')
ax.plot(x, log_y, = timer()
end print(end-start)
'collatz_plot_1M.txt', col_y, fmt='%d')
np.savetxt( plt.show()
We can observe that the function log_{2} x acts as a lower bound for the number of steps, and the figure reaches horizontal stability except for some outliers.
Just to utilize the collatzer_steps()
function, let us
see the steps a very large number takes. Since the y axis starts off
quite high and diminishes rather quickly, I will use a \log y axis.
import matplotlib.pyplot as plt
import numpy as np
from collatz import *
= 42 ** 42
number = collatzer_steps(number, dtype=float)
y = np.arange(len(y))
x = plt.subplots()
fig, ax '.', ms=4)
plt.semilogy(x, y, plt.show()